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Abstract — An information network with in-block memory 
(NiBM) is a generalization of a discrete memoryless network 
(DMN) where blocks of symbols may have memory inside each 
block. A cut-set bound is developed for NiBMs that unifies, 
strengthens, and generalizes existing cut bounds. The bound 
gives finite-letter capacity expressions for several classes of 
networks including point-to-point channels with IBM, and certain 
multiaccess, broadcast, and relay channels with IBM. Cardinality 
bounds on the random coding alphabets are developed that 
improve on existing bounds for channels with action-dependent 
state available causally at the encoder and for relays without 
delay. Finally, digital network coding is shown to achieve rates 
within a limited gap of the new cut-set bound for linear, additive, 
Gaussian noise channels, symmetric power constraints, and a 
multicast session. 

Index Terms — capacity, feedback, information, relay channels, 
networks 



I. Introduction 

An information network with in-block memory (NiBM) is 
an extension of a discrete memoryless network (DMN). Recall 
that a DMN with K nodes has each node k, k — 1,2, . . . , K, 
dealing with four types of random variables llj. 

. Messages Wkm, m — 1,2, . . . , Mk, that have entropy 
H{Wkm) = bits where Mj. is the number of 

messages at node k. The rate of message Wkm is thus 
Rkm = Bkm/n bits per channel use. The {W/cm} are 
mutually statistically independent for all m and k. 

• Channel inputs X^^i, i = 1,2,. . . ,n, with alphabet Xk- 
We interpret z as a time index but it could alternatively 
represent frequency or space, for example. 

. Channel outputs Yk^i, i — 1,2, . . . ,n, with alphabet 3^^. 

• Message estimates W^^, £m e V{k), where V{k) is a 
decoding index set whose elements are selected pairs tra, 
i ^ k, of message indices from other nodes. 

Let /C = {1,2,...,K} be the set of nodes; let £{k) = 
{fcl, k2, . . . , kMk} be the encoding index set of node k; let 
— Yfe.i Yfe 2 • ■ • Yk.i, let r{x, y) be the remainder when x is 
divided by y. For a set 5 C /C we write £{S) — Ukes^i^) 
and Xg_i = {Xk^i : k e S}. For a set S of integer pairs km 
we write Wg = {Wkm ■ km e S}. The relationships between 
the random variables are as follows. 
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Node k chooses functions a/j i ^ 1,2, . . . ,n, such that 



Xk,^ = aikAW£ik),Y^'')■ 



(1) 



For a finite alphabet one may interpret a^!(?«£(j.'), •) 
as a code function (or code tree or adaptive code word 
or strategy) for the messages Wg^^) (see [2, Sec. 15] and 
m Ch. 9]). We write a^(W^£(fc),.) as A^(W^£(fe),.) to 
emphasize that is a random function. 
Node k puts out 



(2) 



for some decoding function dk- 
• A DMN channel is memoryless. A NiBM, however, may 
have in-block memory of length L in the sense that at 
time i node k sees 

Yk,i — /fe,t(i)+i {X]CA~t{i)i ■ ■ ■ 1 Xic.i,Z^ifL^ ) (3) 

k — 1,2, . . . , K, where t{i) = r{i — 1, L), and where 
the Zi, i ~ 1,2, ... , \n/L\, are statistically independent 
realizations of a random variable Z with alphabet Z. 

The channel functions fk,i{), i — 1-2, ... ,L, may be different 
and the alphabets Xk^i and yk.i, i — 1,2, ...,L, may be 
different also. The notation X^ means Aj. i x Xk.2 x . . . x Xk.L- 
Example 1: Consider a two-way channel with IBM of 
length L — 2. The channel puts out 



Yk.i 

Yk.2 
Yk.3 

Yk.i 



for k 



— fk,l{Xi^i, X2.I, 
~ fk,2{Xi^l, Xi^2, 
= /fc,l(-'^l,3, -^^2,3, 

— fk,2{Xi^3,Xi,4, 

1,2 and n 4. 



: ^2,2, Zl) 



,Zi) 
, X2.1 

,Z2) 

: X2.3: ^2,4, Z2) 

A functional dependence graph 
(FDG) for this case is shown in Fig. [T] where the nodes W\, 
W2, Zl, Z2 with hollow circles represent mutually statistically 
independent random variables HJ, Q. 

This paper develops information theory for NiBMs. Our 
main goal is to show that NiBMs are useful because much ex- 
isting theory for DMNs extends naturally to NiBMs. Further- 
more, NiBMs let us unify, strengthen, and generalize theory 
for several classes of networks, in particular for relay networks 
with delays. We believe that the framework of NiBMs is easier 
to understand than that of such specialized networks. 

The document is organized as follows. Section[ll]defines the 
capacity region of a NiBM and introduces notation. Section III 
states our main technical result: a cut-set bound on reliable 
communication rates. Sections IV and [V] apply the bound to 
point-to-point and multiuser channels. We derive a few new 
capacity theorems and cardinality bounds on random variables. 



Section VI extends these approaches to relay networks. Several 
proofs are developed in the Appendices. 
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II. Preliminaries 

A. Capacity 

The capacity region of a NiBM is the closure of the set of 
rate-tuples {Rkm :l<k<K,l<m< Mk) such that for 
any positive e there is an n and code functions and decoders 
for which the error probability 



P, = Pr 



U U 

k emeV{k) 



(4) 



IS at most e. 



B. Causal Conditioning and Directed Information 

We use notation from 1 1 1 for causal conditioning and di- 
rected information. The probability of causally conditioned 
on and conditioned on a is defined as 



P{x'^\\y'^\a) ^\{P[ 



XiW ^y'a) 



(5) 



(6) 



i=l 



As done here, we will drop subscripts on probability distribu- 
tions if the argument is the lowercase version of the random 
variable. Causally-conditioned entropy is defined as 

L 

H{X^\\Y^)^^H{X,\X'-^Y') (7) 

i=l 

L 

H{X^\\Y^\A)=^H{X,\X'-^Y'A). (8) 



Directed information is written as 

I{X^ ^Y^)=H{Y^)-H{Y^\\X^) (9) 

I{X^ ^ Y^\\Z^) = H{Y^\\Z^) - H{Y^\\X^,Z^) (10) 

I{X^ ^Y^\\Z^\A) (11) 

= H{Y^\\Z^\A) - H{Y^\\X^,Z^\A). (12) 

C. Further Notation 

The functional dependence ([TJ is expressed as 
l{xk,i\si\,y]r^) in place of P{xk,i\a.l,y]7^). We similarly 
write l(x^||a^,Oy^-i) in place of P(4||a^, Oy,^-i). It will 
be convenient to split symbol strings into blocks of length L. 
We use the notation 



k.i 



3-fc,i(m-l) + l 3-/£,i(m-l)+2 •■• ^kd(m~l)+L 



^k,i ~ 2;^, j(„_x) + l2;fe,i(m-l)+2 • • ■ 2;fe^i(m-l)+L 
Uka = yk,i{m-l) + l 2/fc,i(m-l)+2 ■ • ■ yfc,i(m- 1)+L ■ 

We write supp(Px) for the support set of Px{')- We write 
the binary entropy function as H2{-) and differential entropy 
as h{-). Logarithms are taken to the base 2. 

D. Channel Distribution 

We have defined the channel using the function It will be 
convenient to alternatively define the channel by a probability 
distribution. Consider P(a5J-, x)^, y£) that factors as 
" K 

n^(aDl(x,"||a^,02/ri) P{yl\\xl). (13) 



.k=l 

The P{yl\\xl) 

'in — 1 



further factors into m = L'\ blocks as 
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where the last block has length L' = n — {m — l)L. We focus 
on the case n = mL so that L' = L and all blocks have length 
L. The expression ( [T4| i means that we may define the NiBM 
channel by using the block-invariant distribution PiyjcW^k) 
rather than by using Z and the functions in ([3]l. 



Sec. 15.10], f6'. p. 477] have statistically dependent inputs (see 
Sec. [nrBj. 

Remark 2: The l(a;^||a^, Oy^~^), k = 1,2, ...,K, are 
fixed functions and P{yx:\\x^) is fixed by the channel. 



E. Linear Channels 

We consider several examples where the the channel alpha- 
bets are the field F. We write the channel inputs and outputs 
as vectors [Xk,i . . . XkxV and = [Yi^k ■ ■ ■ Yk,L7 , 

respectively. For instance, a scalar, linear, and additive-noise 
channel has 



Remarks.- Fixing P{yi^\\xj^) fixes P{yl^\a.j^). We may 
thus view the channel as being P{y^\a^) for the purposes 
of deriving achievable rates and computing the cut-set bound. 

Remark 4: /(A|; Yga\Aga) is concave in P^^. This resuh 
follows by the concavity of I{A;B\C — c) in Pa\c=c when 
Pb\ac=c is held fixed, and because P{y^\a^) is fixed. 

Remark 5: The are not "auxiliary" random variables, 
i.e., they are explicit components of the communication prob- 
lem just like the channel inputs X^. Moreover, the cardinali- 
ties l^^l are automatically bounded by the channel alphabets. 
(15) Remark 6: Average per-letter cost constraints can be dealt 



with in the usual way (see Remark 29 1. More precisely, if we 



where the Gkj are L x L lower-triangular matrices and the have J cost functions ( ) and constraints 

'7 , 1^™ u t-u^t- 'vt^ :^ 1 — 4- 



Z_i. are random vectors such that Z_ is independent of X 
We write the covariance matrix of a random vector X_ as Qx 
and its determinant as |Qx|- 



III. Cut-Set Bound 

We develop a cut-set bound for NiBMs that generalizes the 
classic cut-set bound for DMNs. Consider a set S of nodes 
and let be the complement of S in JC — {1,2, . . . ,K}. 
We say that {S,S'^) is a cut separating a message Wum and 
its estimate W^^2 if fc € 5 and £ e S". Let M{S) be the 
set of indexes (which are integer pairs km) of those messages 
separated from one of their estimates by the cut {S,S'^), and 
let Rm{S) be the sum of the rates of these messages. 

There is a subtlety in that the NiBM can have high mutual 
information at the start of each block and low mutual infor- 
mation at the end of each block. This could mean, e.g., that 
using the channel 1 time is better than using it a large number 
of times. To avoid such issues, we require that the channel is 
used n = mL times for a positive integer m. Alternatively, 
we could require that n be much larger than L. We have the 
following result that we prove in Appendix [A] 

Theorem 1: The capacity region C of a NiBM of length L 
that is used a multiple of L times satisfies 



(16) 



where TZ{Pji^l,S) is the set of non-negative rate-tuples satis- 
fying 



R 



^■Y^.\At)/L. 



The joint probability distribution P{a^,x^,y^) factors as 



(17) 



K 



Y[lixj:\\atOy^-') 



PiykW^k)- 



(18) 



Remark 1: The code functions in Theorem [T] are statisti 



cally dependent. This is different than in Sec. 



ll where the 



code functions are independent (see Fig.[T]and ([13 i). Similarly, 



Shannon's outer bound for the two-way channel 



2, Eq. (36)] 



and the classic cut-set bound for DMNs lH], H Ch. 10], H] 



1 " 

-^E[s,(XK,„r^,,)]<^j, J = 1,2,..., J (19) 

1=1 

then one may add the requirement that the union in ( [T6] l is 
over distributions ( [T8] l that satisfy 

1 ^ 

-^E[sj(XK,„rK..)]<^j, J = 1,2,..., J. (20) 

1=1 

One may treat average per-block cost constraints similarly. 

Remark 7: A bound in [7, Thm. 2] and (8] Thm. 2] is 
similar to ( [T7] l for the special cases of relay networks with 
delays and causal relay networks without messages at the 
causal relays. We discuss these bounds in more detail in 
Remark [26] below. 



A. Weakened Bounds 

The bound ( fTT] ) may be weakened as follows: 

L{K%;Yk.\K%) 



(a) 



Y^H{Ys^,\Y^f^K%)-H{Ys^,\Yl-^^^) 



L 



<Y,H{Ys^,,\Y^-'A^s^)~H{Ys^,^\Yi;-'A^^ 



i=i 



= /(A|^ri;||A^.) (21) 
where (a) follows by the chain rule for entropy and because 



(Ak;,^+i...Ak;,l)- AJ^Y. 



Y, 



5^^ 



forms a Markov chain. The bound ( |2T] l may be further 
weakened to replace code functions with channel inputs and 
outputs: 

/(A^^yii||A|.) 

L 



(a) 



< J2HiYs^,\Y^-'Xl,.)-HiYs.,.\Y^-'XlAl) 



1=1 



(b) 



I{X^,QYj-' ^Ys'-.WX^.) 



L II vL 



(22) 
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where (a) follows because Y^^^Al. defines XI and because 
conditioning cannot increase entropy. Step (6) follows because 
( fTSl l ensures that the chain - Y^^^X]^ - Y^ci is Markov. 

Remark 8: The FDG of a NiBM has statistically indepen- 
dent code functions, see Fig. [T] We thus have 

i/(r5e,,|rj7^x^cA|.) = i/(r5^,|y;7^x^c). (23) 

However, the identity ( |23| l may not be valid when considering 
dependent code functions such as in Theorem [T] 

Remark 9: The cut-set bound with the normalized ( |22] i in 
place of the right-hand side of {VJ) was derived in |8 Thm. 1] 
for causal relay networks and in [9, Thm. 1] for generalized 
networks. The authors of ISl, ID restrict attention to multiple 
unicast sessions as in l5l Sec. 15.10]. Combining Theorem [T] 
and p2l ) extends the bounds to multiple multicast sessions. 
We discuss these bounds in more detail in Sec. IVI-CI 

Example 2: Consider additive noise channels with 

Yk^, ^ fkAXh) + Zka (24) 

for i ~ 1, 2, . . . , L, k ~ 1,2, . . . , K, where ,j, Z^ i, and 
fk,i{Xj^) take on values in the field F. The noise variables 
are independent of X^. The bound ( p2] i is 

/(A|;yii|A|e) < /(x|,ori'-i ^ Ysh\\x^.) 

= H{Yi4X^.) - i/(Z|.||OZ|-i). (25) 

Since H{Zga\\QZg^^) is fixed by the channel, the cut-set 
bound with the normalized ( [25] ) in place of the right-hand 
side of ( [T7] i is a maximum (conditional) entropy problem. 

Example 3: A special case of (|24]) is a deterministic NiBM 
for which the noise is a constant and 

/(A^; yjilAi) < H{Ysh\\X^.). (26) 

B. DMNs 

Suppose the NiBM is a DMN. We have L = 1 and recover 
the classic cut-set bound. Alternatively, we may view the DMN 
as an NiBM of length L and with 

L 

The weakened bound ( p2| i becomes 

I{X^,OYi-' ^YshWX^.) 

L 

= J2H{Ys.MXh^Y^^') - H{Ys.^^\X^^.) 

i=\ 
L 

<Y.I{Xsx,Ysc-Ms^.i)- (28) 

i=l 

If we choose the code functions as code words and 

L 

then we achieve equality in ( |28| l. We recover the classic cut-set 
bound by choosing P{x]c,i) — Px,c{^!C,i) for all i. 



W 




W 



Fig. 2. FDG for a point-to-point channel witli iBM of length L = 2 and 
n = 4 channel uses. 



Remark 10: Consider a DMN that is time varying in blocks 
of length L, i.e., we have an NiBM of length L and 

L 

1=1 

The cut-set bound of Theorem [T] may be computed with 
independent inputs as in ( |29] l. 

IV. PoiNT-TO-PoiNT Channels 

Consider a point-to-point channel with input X^ taking on 
values in X^, receiver output taking on values in 3^^, and 
feedback taking on values in y^. A FDG for L = 2 and 
n = 4 is shown in Fig. |2] 

Theorem 2: The capacity of a point-to-point channel with 
iBM of length L is 

C = max/(A^;y^)/i (31) 

where P{sl^ ,y'") factors as 

P(a^)l(x^||a^Oy^-l)P(/,y^||x^). (32) 

Proof: Achievability follows by random coding with a 
maximizing Px^- The converse follows by Theorem [T] ■ 
Remark 11: The distribution ( (32] i gives 

1{A^;Y^)=1{A^ ^Y^). (33) 

Remark 12: The feedback can be noisy. 

Remark 13: In-block feedback can increase C but across- 
block feedback does not increase C. This statement refines 
Shannon's classic theorem on feedback capacity [10. Thm. 6]. 
The transmitter can thus ignore for all i, e.g., we could 
remove Y2 and ^4 in Fig. v\ without changing C. 

Remark 14: I{A^;Y^) is concave in P^l and the 
Arimoto-Blahut algorithm ifTTl . lfT2l can perform the maxi- 
mization ( |3T] ). 
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The cardinality \A^\ is bounded by the channel alphabets 
(see Remark [5]) and we have 

L 



A' 



(34) 



Unfortunately, ( (34] i means that grows doubly exponen- 
tially in L if the alphabet sizes are similar for all i. We prove 
the following Theorem by using classic results |13 p. 96], 
lfT4l , p. 310] on bounding set sizes. 

Theorem 3: The maximum in Theorem [2] is achieved by a 
Pji^L for which |supp(Pa.'^)I is at most 



y 



i\X,\ - 1) . (35) 



Proof: See Appendi x [B] ■ 
Remark 15: Theorem l3j ensures that |supp(PAi-)l must 

grow only exponentially, and not doubly exponentially, in L. 

Of course, one must still determine supp(Pa^) which can be 

a high-complexity search problem for even small L. 

Example 4: Consider a channel with L = 2 that has binary 

symmetric channels (BSCs) with 

ri = Xi®Zi, Yi = Zi, r2 = ^2©^i©^2- (36) 

The bits Zi and Z2 are independent and Pz^il) = £1 and 
(-'^) ^ ^2- This is an additive noise channel of the form (|24]| 
and for 5 = {1} we compute (see ( pSj l) 

H{Z^40Z§'^) = H{Z,) + H{Z, ®Z2\Zi) 

= H2{ei) + H2{e2). (37) 

To compute the capacity, consider the steps 

I{A^;Y^) = H(Y^) - [H{Yi\Xi) + //(FalFiXi A2)] 
H{Y^)-[H2{ei)+H2{e2)] 

[H2iei) + H2{e2)] (38) 



(a) 



{b) 
< 2 



where (a) follows because Y1X1A2 determine Z1Y1X2, and 
with equality in (6) if Xi and X2 are independent and 
uniformly distributed bits. The capacity is thus given by the 
right-hand side of ( |38| l. 

For instance, we may transmit X2 — X2 Zi where X2 
is independent of Xi. We translate this strategy into a code 
function (here a code tree) distribution. We label A^ as b, bobi 
by which we mean that Xi — b, X2 = bo if Yi = 0, and 
X2 = 61 if fi = 1. We choose 

Pa2(0,00) = Pa^ (0,11) -Pa^ (1,00) = Pa^ (1,11) -0 
Pa2(0,01) = Pa40,10) = Pa2(1,01) = Pa^(1,10) = 1/4 

and achieve capacity with four code trees, as predicted by 
Theorem [3] 

Example 5: We show the deficiencies of the weakened 
bound based on (|22]i. Suppose the channel is 



Yi^Zi, Y2 



(39) 



where Zi and Z2 are independent with P2_^{1) — ei and 
-^^2(1) = £2- We achieve 

C={l-H2iei))/2 



by having the receiver compute Fi ® 1^2 — Xi (B Zi. In other 
words, we achieve capacity with uniform Xi and so we require 
only two code trees = 0, 00 and = 1, 00. 

For the weakened bound ( |25] l, observe that ( [39] l has the 
form ( |24] i. Defining ei * 62 = ei(l — £2) + (1 ^ £1)^2 and 
5 = {1} we compute 



HiZi^40Z^ 



HiZi ffi Z2) + H{Z2\Zi ® Z2, Zi) 
H2{ei*e2). (40) 



The weakened bound ( [25] l is therefore 

2C < max i/(y2) - i?2(ei * £2) 
= 1 + i?2(e2) - -ff2(ei * £2) 



(41) 



with equality if Xi is uniform. This bound is loose in general, 
e.g., if £1 = 1/2 then C = but (|4T} gives C < iJ2(e2)/2. 

A. Noise-Free Feedback 

The feedback is noise-free if Y^ is a causal function of 
X^ and Y^, i.e., if = fiiX\Y') for i = 1, 2, . . . , L. 
For instance, the channel (|36]l has noise-free feedback. The 
expression ( (3T| simplifies to 



C 



max /(X^^y^)/L. 



(42) 



Example 6: Consider the additive noise channel ([TSj with 
Y = GX + Z (43) 
where Z is independent of X. We compute 

H{Y^\\X^) = H{Z^) (44) 
so that computing ( |42| ) reduces to maximizing H{Y^). For 



instance, for modulo-additive channels the maximizing 
will be uniformly distributed over F^, and for additive Gaus- 
sian noise (AGN) channels with transmit power constraints the 
maximizing X^ will be Gaussian. 

B. Block Fading Channels 

Channels with block interference fTSl or block fading fT6l 
have a state S that is memoryless across blocks of length L 
and whose realization S — s specifies a memoryless channel 
in each block. In other words, when 5 = s we have 

L 

P(/,y^||a;^|s) = []Py^|;^s(y„y,|a;„s). (45) 
j=i 

We may view such channels as NiBMs for which Z = SN^, 
i.e., Z includes the state S and a noise string A^^ where the Ni, 
i = l,2,...,L, are statistically independent and identically 
distributed. Equation (|3]l thus becomes 

Yi = /t(i)+i {Xi-t{i), ■ ■ ■ S^i/L-]Ni) (46) 
Yi = /t(i)+i {Xi~t{i), ■ ■ ■ S^i/L^Ni) (47) 
for i = 1,2, ... ,n. 
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w 




Fig. 3. FDG for a channel with state known causally at the encoder. The 
NiBM has L = 2. The message estimate W is not shown. 



W 




Fig. 4. FDG for a channel with action-dependent state known causally at 
the encoder. The NiBM has L = 2 and the actions are Bi and B2. 



C. Channels with State Known Causally at the Encoder 

Shannon's channel [17| with state known causally at the 
encoder is a point-to-point channel with iBM of length L = 2. 
To see this, choose Xi =yi = {0} and Yi ~ S where S is the 
state. A FDG is shown in Fig. [3] where we have renamed the 
random variables A, 5*, X, Y, and Z. In the standard model, 
the noise Z has two components SZ' where Z' is independent 
of S' and y = /(X, S*, Z') for some function /(•). 

The capacity is given by Theorem |2] and normalizing by a 
factor of L — 2. Moreover, the alphabet size of A is \X\^^^ 
but (|35]) tells us that 

|supp(Pa)| < mill (13^1, 1 + |5| • (lA-l - 1)) (48) 

suffices. The |3^| bound is due to Shannon ifTTl and the second 
bound was reported in [18 , Thm. 1]. 

Example 7: Suppose that S ^ X = y = {0, 1}, PsiO) = 
1/2, and Y = X (BS. We label the branch-pairs A as bobi, by 
which we mean that X — bo if S = and X ~ bi if S = 1. 
The capacity is clearly 1 and by choosing 

-Pa(OO) =Pa(11) =0 
-Pa(01)=Pa(10) = 1/2 

we attain /(A; Y) ^ 1 with two code trees, as predicted by 
Theorem |3] Note that I{X; Y) = is not the capacity. The 
weakened bound ( |22] l happens to give C = I{XS; Y) = 1. 

Remark 16: The above construction extends in an obvious 
way to show that any DMN with state(s) known causally at 
the encoder(s) is effectively an NiBM of length L — 2. The 
cut-set bound ([T6| thus applies to these problems. 

D. Channels with Action-Dependent State 

Weissman's channel with action-dependent state lets the 
transmitter influence the state lfT9]| . If the state is available 
causally at the encoder, then this model is a point-to-point 
channel with iBM of length L — 2. We treat the model shown 
in Fig. |4] at time i — 1 the action B leads to the feedback 
state S, and at time i ~ 2 the channel input and output is X 
and Y, respectively. Theorem |2] gives the capacity 

2C = max/(AA;r) = max/(BA;r) (49) 

Pa A PbA 



and Theorem |3] gives 

|supp(P^^)| <min(|3;|,|6| + 161151(1^-1-1)). (50) 



Remark 17: The expression (|49]l is the same as in lfT9l 
Thm. 2] because U plays the role of AA. 

Remark 18: The constraint (|50]l is slightly stronger than 
that in HH Thm. 2]. 

Remark 19: The model in Fig. [4] seems more general than 
in 1 19 1 because Z may influence both S and Y. However, the 
associations described in |fT9l p. 5405] show that the original 
model includes more problems than apparent at first glance. 
(See also comments in |19, Sec. VII].) 

Remark 20: The model in Fig. [4] may seem different than 
in 1 19 1 because S may influence future actions as well as the 
present and future X. However, across-block feedback does 



not increase capacity (see Remark 13 1 so we may remove the 
S-io-B functional dependence without affecting capacity. (See 
also comments in |19, Sec. VII] concerning feedback). 

Remark 21: We may add functional dependence from B 
to Y without changing the capacity expression. Similar com- 
ments are made in |19, p. 5398 and Sec. VII]. 

Example 8: Consider a channel with a rewrite option |fT9l 
Sec. VA] which means that the B-to-S and X-to-Y channels 
are effectively the same. At time i = 1 the encoder "writes" 
on the B-to-S channel. At time « = 2, if the encoder is happy 
with the outcome S then it sends a no-rewrite symbol N which 
means that Y ^ S. But if the encoder is unhappy with S then 
it "rewrites" a symbol on the X-to-Y channel. 

We have X = BU {N}, S = y, and the bound ^ is 
|supp(P^^)| < |3^|. For example, suppose the P-to-S* channel 
is a BSC with crossover probability 5, < 6 < 1/2 (see |19]). 
We label AA as b, bobi by which we mean that B = b, X = bo 
if 5 = 0, and X = 5i if S" = 1. We have |3^| = 2 and achieve 
C = /(AA; Y) = l- H2{S^) by choosing 

P^A(0,iVO) = P^^(l,17V) = l/2. 

We require only two code trees, as predicted by Theorem [3] 
Remark 22: Multiple rewrites are modeled by increasing L. 



7 



V. Multiuser Channels 

A. Multiaccess Channels 

Consider a two-user (three-terminal) MAC with iBM and 
with inputs X^, X^, and outputs Y^, Y^, Y^ . The FDG 
for L = 2 and n = 4 is the same as Fig. [T] except that the 
variables Yi, i = 1,2,3,4, are missing in Fig. [l] The cut-set 
bound of Theorem [T] is 

0<Ri, < i?2 
i?i </(Af;r^|Af)/L 
{Ki,H2) ■ </(Af;r^|Af)/i 

Ri+R2< I[A{A^-Y^)/L 

(51) 

If there is no feedback, then the cut-set bound can be strength- 
ened in the usual way to 



u 



< < i?2 

Ri < I{Xt,Y^\X!^T)/L 
i?2 < I{X^;Y^\X\:T)/L 
i?i + i?2 < I{X^X^-Y^\T)/L 



(52) 



where the union is over distributions such that X^ — T — X2 
forms a Markov chain (T is the usual time-sharing random 
variable). This modified cut-set bound is the capacity region 
without feedback. The result is not new, however, since the 
model is a special case of a classic MAC with vector alphabets. 

Remark 23: MACs with state known causally at the en- 
coders were treated in [(221 Sec. IV]. As pointed out in 
Remark 16 such channels are NiBMs of length L = 2. For 



example, the outer bound of Theorem 3 in f22^, Sec. IV] is the 
same as the cut-set bound of Theorem [T] 

B. Multiaccess Channels with Feedback 

Several capacity results for DMNs generalize to problems 
with iBM. For example, consider Willems' result |20| that 
the Cover-Leung region 121] is C for full feedback (Yi ~ 
Y2 — Y) and where one channel input, say Xi, is a function 
of Y and X2- A natural generalization to MACs with iBM 
is to consider full feedback (Yi.i = 1^2,1 ~ Y^i) and require 
Xu^ = MX'^,Y') for i = 1,2, . . . ,i. A MAC of this type 
is the binary adder channel (BAC) with {0, 1} input alphabets 
and integer-addition output 



Y_ — GiX^i + G7X2 



(53) 



where Gi and G2 are lower-triangular matrices with {0, 1} 
entries, and where Gi has ones on the diagonal. 

Theorem 4: The capacity region of a MAC with iBM and 
full feedback and where Xu — fi{X2, Y"^) for all i is 



< < i?2 
I I ) J... Ri<I{A[;Y^\A^V)/L 

Ri+R2< I{A{A^-Y^)/L 

where the union is over distributions that factor as 
" 2 

P{v) \lP{B.i\v)l{xi\\4,Qy^-^) 

.k=l 

A cardinality bound on 1/ is |V| < |3^^ 



(54) 



P(y^||xf,4). (55) 



Proof: The proof mimics that in EOl and is given in 
Appendix [D] ■ 
Proposition 1: An alternative way of writing (|56| is 



< < i?2 
[j\{R,,R2): ^t\\x\ 



L V vL 1 1 V L 



Y'^\\X:^\V)/L 



R2<I{X^^Y^\\X^\V)/L 
Ri+R2< I{XtX^ ^ y^)/L 

where the union is over distributions that factor as 

2 



(56) 



P{v) 



V] 



.k = l 



Proof: Consider the distribution ( |55| ). The chains 



A^ 

-^2 

A^A. 
are Markov so that 



VX'^Y' 



Y 



2 - vxix'^y' 



Y 



(57) 



(58) 
(59) 



I{A^;Y^\A^V) 

L 

= ^i/(y,|A^y'-ix^ V) - H{Yi\AiA^Y'-^XlX-^ V) 



I{X^ -^Y^\\X^\V). 



(60) 



Similarly, the chain Af - VXIY^ ^ - is Markov which 
combined with the Markov chain (|59]l gives 



/(A^;r^|Af = I{X^ r^llXf |y) (61) 

. L \ L. \rL\ _ r( vL vL ^ vL\ (62) 



II 



The distribution (pTli follows from 



C. Broadcast Channels 

Consider a two-user (three terminal) BC with iBM. We label 
the transmitter inputs and outputs as X^ and Y^, respectively, 
and the receiver outputs as Y^ and Y.^. Suppose there are 
only dedicated messages and no common message. The cut- 
set bound of Theorem [T] is 

r 0<i?i </(A^;Yi^)/L ] 

\J{{RuR2): Q<R2<I{A^:Yi)/L ). (63) 

{ Ri+R2< I{A^;Y^Y^)IL J 

An achievable region follows by extending Marton's region 
as in JT] Lemma 2]: the non-negative rate pair {Ri,R2) is 
achievable if it satisfies 



LRi < I{TUi;Y,^) 
LR2 < I{TU2]Yi^) 

HRi + R2) < mill (/(T; Y,^),I{T; K,^)) 

+ /(C/i;yi^|r) + /(f/2;r2^|T) - /([/i;C/2|r) 



(64) 



for some auxiliary random variables TU1U2 for which the 
joint distribution of the random variables factors as 

7/1, u2)P(x^||0y^-i |i, u^,u2)P{y^,y^\\x''). (65) 



Marton's region is known to be the same as ( [63) for L — 1 
and deterministic broadcast channels. For L > 1, suppose that 
Yi.i and Y2.i are functions of X^ for all i. We may choose 
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without violating the Markov 



(66) 



T = 0, C/i = Fi^, and U2 = Y.^^ 
condition ( |65] l and achieve 

r < i?i < H{Y^^)/L 

U { (i?i,i?2) : Q<R2<H{Yi^)lL 
P^L y Ri+R2<H(Y^^Y^^)IL 

The cut-set region ( [63] ) is the same as (|66]l, and therefore ( |66| ) 
is C. In fact, feedback does not increase capacity because the 
transmitter knows, and controls, the channel outputs. 

Remark 24: The capacity region of a physically degraded 
BC with two receivers and state known causally at the encoder 
was derived in [22, Sec. II]. Such channels are NiBMs of 
length L — 2, see Remark 16 The cut-set bound of Theorem[T] 
is loose but the capacity region is achieved by using the coding 
method described above. In particular, we choose U2 in ( |64j l- 
( |65| l to be a constant and recover the achievability part of 
Theorem 1 of Sec. II]. 

D. Interference Channels 

The cut-set bound is often not so interesting for BCs or 
interference channels (ICs) with L — 1 because better capacity 
bounds exist. The same will be true for L > 1. On the other 
hand, studying extensions of existing bounds and achievable 
regions is interesting, e.g., extensions of the Han-Kobayashi 
region 1231 to L > 1. It may also be interesting to study 
interference alignment ll24l . 1251 for NiBMs. 

VI. Relay Networks 

Our study of NiBMs was motivated by results on causal re- 
lay networks 1 8 1 and generalized networks |9J . These networks 
effectively extend relay networks with delays Q in the sense 
that for every relay network with delays there is a causal relay 
network having the same capacity region. Furthermore, causal 
relay networks and generalized networks are special NiBMs. 
This section focuses on relay networks with IBM and applies 
Theorem [T] to this class of problems. 

A. Relay Channels 

Consider a three-node relay channel (RC) with IBM and 
source inputs X^, relay inputs X[ and outputs Y^, and 
destination outputs F^. The RC is a special case of the MAC 



in Sec. V-A where node 2 (the relay) has no message and the 
node 1 (the source) has no feedback. A FDG for L = 2 and 
n = 4 is shown in Fig. |5] The cut-set bound of Theorem [T] is 

LC < maxmin(/(X^; Yi^r^|Af),/(X^Af;r^)) (67) 

where the maximization is over Px^a'-- 

We next list several classic coding strategies l26l . ||27]| . The 
achievable rates follow by Remark [3] and standard random 
coding arguments (see p. Sec. VI]). 

• Decode-forward (DF) achieves rates R satisfying 

LR = maxmin (/(X^; y/-] Af ), /(X^Af ; F^)) (68) 
where the maximization is over Px^^Af- ™d where the 



joint distribution factors as 



• Partial decode-forward (PDF) achieves R satisfying 
LR = maxmin (/(JJ; Y]^|Af) +/(X^;y^|Af;7), 



I{X^A.[;Y^)) 



(70) 



where the maximization is over Pux^a{ ^^'^ where the 
joint distribution factors as 

PKx^,af')l(4||af,02/f-i)P(2/i^, 2/^11x^,4). (71) 

The rate ( |70] i generalizes fT Prop. 5]. 
• Compress-foward (CF) achieves R satisfying 

LR ^ maxmin (^/(X^; y/^F^IAf T), 

I{X^A^;Y^\T) - I{Y^^;Y^^\X^A^Y^T)^ (72) 

where the maximization is over joint distributions that 
factor as 



P{t)P{x'^\t)P{a'j:\t)l{xf\\afM-') 
■P{y^\Ri,y^,t)P{y^,y^\\x\xf). 



(73) 



Example 9: Remark |3l states that we can view the channel 
as being P(yf', y^|a;^, af ). The RC is physically degraded if 
the chain 



X' 



Y' 



is Markov so that I{X^;Y^\Aj'Y^^) = 0. The DF rate (|68]) 
thus matches ( |67| . This capacity result generalizes ||7j Prop. 6]. 

Remark 25: Physically degraded RCs with state known 
causally at the encoder are treated in [221, Sec. III]. Such 
channels are NiBMs of length L = 2 (see Remark 16 1 and 



Theorem [T] gives the converse for Theorem 2 in [22, Sec. IV]. 
However, these channels fall outside the class of RCs treated 
in this Section because the source node receives the channel 
state as "feedback". 

Example 10: Suppose the RC is semi-deterministic in the 
sense that Yi_,j = fi{X\X{) for i = 1,2, ...,L. We may 
choose U — Y^ and ( |70] i becomes the cut-set bound ( |67] l. 
This capacity result generalizes |7 Prop. 7]. 

Example 11: Suppose the RC is semi-deterministic in the 
(more general) sense that Yii = fi{X^ ^Xl^Y"^) for i = 
1, 2, . . . , L. Consider ( |72| ) for which we have 

I{Y^;YI'\X^A[Y^T) ^ 0. (74) 

We choose T as a constant and Y^ ~ Y^ so that ( |72j i is the 
right-hand side of (|67|) but with independent X'" and A^. 



Example 12: A special case of Example 11 is where Yi ^ = 



fi{X^,Y^) and there is a separate channel with IBM and 
capacity Rq from the relay to the destination (see l28l ). The 
best X^ and Af are independent so the choice Y^ 



lets CF achieve the cut-set bound (167 



P{x^ 



l(a;f||af,0yf-i)P(zA^,2/^||a;^,a;f). 



(69) 



Y,^ 



B. Relays without Delay 

A relay network with delays is an NiBM in the sense that 
relabeling time-indexes lets one convert the relay network into 
an NiBM. We are particularly interested in a relay without 
delay. The corresponding NiBM is a RC with IBM of length 
L = 2 and the random variables Xi, Yi i, X1.2, ^2, i-e., we 
have Xi i — yi — 3^1,2 = A2 = {0}. For simplicity, we 
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Fig. 5. FDG for a RC with iBM of length L = 2. 




Fig. 6. FDG for a RC when the relay has no delay. The channel is a NiBM 
of length L = 2. 



rename the variables of interest as X, Yi , Xi , Y, respectively, 
so that the channel is 

Piyly^\\x^,xi) - Piyi\x) ■ P{y\x, xi, yi). (75) 

A FDG for 2 channel uses per node is shown in Fig.|6] Observe 
that this is a subgraph of Fig. [5] where L = 2, n — A, and 
where nodes have been relabeled. 
The cut-set bound i&Tl is 



(76) 



where the maximization in ( |76| l is over PxAi and \Ai\ ~ 
lA-ipil. In fact, (|76]l is simply Willems' bound in \L P- 3419]. 
We show in Appendix [C] that one can choose 

|supp(PaJ| <min(|3^| + l,|A'|-|A'i| + l). (77) 

Remark 26: The cardinality bound ^Tl\ almost agrees with 
iH Thm. 2] since y in Q Thm. 2] is the code function index. 
However, the maximization in ( |76| ) has a smaller search space 
in general. To see this, observe that ( |76| ) requires optimizing 
PxAi by considering at most A^^ — min(|3^| + l, | Afj • | A'lj + l) 



2C < max mill {I{X-YiY\Ai)J(XAi-Y)) 



(78) 



out of I A"! 1 1-^1 1 code functions. We must therefore perform at 
most 

Na 

optimizations in |X| A^^ — 1 dimensions. In contrast, ||7] Thm. 
2] requires optimizing PxAi for lAfipl l-^^^l functions /(•) : 
V X 3^1 ^ A"! where |V| is at most Ny = \X\ • |Ai| + 1. We 
thus have at most lAil^^' l-^^il optimizations in \X\ ■ Ny — 1 
dimensions. But we have Na < A'^v' and 

Na 

so the optimization in ( |76l ) is generally simpler than the 
optimization in Q Thm. 2], This discussion shows that one 
may as well consider code functions directly rather than 
introducing auxiliary random variables and auxiliary functions. 

Example 13: Suppose that \X\ = \Xi\ — 2 and |3^i| = 4. 
Then ( |77| ) states that at most 5 code functions (here code trees) 
out of 16 need have positive probability. Our search is thus 
over ( g^) = 4368 combinations of code trees. In comparison, 
IT] Thm. 2] requires a search over 2^° 10^ mappings /(•). 

C. Causal Relay Networks and Generalized Networks 

Causal relay networks JS] and generalized networks ||9l are 
NiBMs where every node transmits at most one channel input 
and receives at most one channel output in each block. For 
example, the FDG for one block of a causal relay network 
with K — ^ nodes is shown in Fig. [7] Nodes 1 and 2 are 
strictly causal relays and nodes 3, 4, and 5 are causal relays. 
This relay network is a NiBM of length L — i. 

In the language of [8|, the strictly causal relays are in the 
set Ml = {1,2} and the causal relays are in A/q — {3,4,5}. 
In the language of |9 |, we have the input and output partitions 



5 = {5i = {l,2},52 = {3,4},53 = {5}} 
a = {ai={3,4},g2 = {5},g3 = {l,2}}. 



(79) 



There are several cut bounds to consider For example, 
consider S — {1, 3} for which |8| uses U — S C\ Mq = {3}, 
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Fig. 7. FDG for a causal relay network with K 
channel uses. The network is a NiBM of length L = 



5 nodes and n 



W = {4,5}, V = SnMi 
we have the bounds 



{1}, V = {2}. Using Theorem[T] 



3i? 



M(S) 



(a) 

< I (^lAg 



y2nn|^2A4A5) 
/(Xi;y4|^2A4A5) 

+ 1 {X,A3;Y,\X2X4YiAiA^) 

+ I (X1A3; r2|X2^4X5nnA4A5) 

< /(Xi;y4|X2A4A5) 

+ / (^1X3^3; r5|J^2^4KiA4A5) 
+ / (Xi A3; r2|^2^4^5nnA4A5) 

(d) 

< I{X,;Yi\X2) 

+ I {XiX3Y3;Y5\X2X4Y4) 

+ 1 {XiX^Ys: ^21^2X4X5^^) 



(80) 



where (a) is simply ([TtJ and (6) follows by using the chain 
rule for mutual information and the Markovity in the channel. 
Step (c) follows because we have added Y3 to the second 
mutual information expression and by using the Markovity in 
the channel. The result is the bound of |8 Thm. 2] when the 
causal relays do not have their own messages. Step (d) follows 
similarly and is the bound of |8 Thm. 1] and |9, Thm. 1] when 
the relays may or may not have their own messages. 
Summarizing, we infer that: 

• Theorem [T] is at least as good as |[8] Thm. 1 and 2] and 
lH Thm. 1]. 

• Example [5] shows that Theorem [T] can be better than ||8] 
Thm. 1] and [9, Thm. 1]. 

• If the causal relays have no messages then Theorem [T] 
can be better than f8, Thm. 2] due to inequality (c), see 



Example 14 below. Furthermore, the auxiliary random 
variables Uk in [8, Thm. 2] are not specified to be code 
functions. The optimization is thus more complex than 
by using Theorem [T] in general (see Remark [26|. 
Example 14: Consider Fig. [7] with Xk ^ ~ {0} for 
fc = 2,4, i.e., nodes 2 and 4 are removed from the problem. 



Consider Yj, = [Xi,Z] where Xi = {0,1} and Pz(0) = 
Pz{^) — 1/2, and Ys = Z. Suppose there is only one message 
with rate i?i5 at node 1 destined for node 5 (so the causal 
relays at nodes 3 and 5 have no messages). We effectively 
have a RC with no delay and the capacity is zero because 
X1A3 has no influence on ^5. For instance, the cut-set bound 
^ with S = {1,3} gives 3i?i5 < /(XiA3;r5|A5) = 0. 

Next, consider the cut-set bound of |8 Thm. 2]. There are 
two cuts to consider without nodes 2 and 4. The cut S = {1,3} 
gives (see dHOb after step (c)) 



3i?i5 < /(^i^3>3; n|A5) = 1 (81) 

and the cut 5 = {1} gives 

3i?i5 < /(Xi;y3n|A3A5) = i/(Xi|A3A5). (82) 

But we have H{Xi\A-iAc,) = 1 by choosing Xi independent 
of A3A5 and Fxi(O) = PxA^) = 1/2- Thus, the cut-set 
bound of [|8, Thm. 2] is loose while Theorem [T] is tight. 

Example 15: Consider the generalized network called a 
"BSC with correlated feedback" in |9, Sec. VI]. This network 
is a two-way channel with iBM of length L — 2 and with 
binary inputs and outputs 



Y2,i — Xi^i ' 

Y1.2 = -^^2,2 ' 



where Pz{l) = 1 - Pz{0) = f- The rate pair (i?i,i?2) = 
(1 — H2{e),l)/2 is achievable by choosing Xi 1 as uniform 
over {0, 1} and ^2,2 ~ X2 2©i^2,i where X2 2 is independent 
of F2,i and uniform over {0, 1}. For the converse, the cut-set 
bound of Theorem [T] is 



U (^l'^2) 



0<i?i </(Xi.i;r2,i|A2.2)/2 

0<i?2</(A2,2;ri,2|Xi,i)/2 

^-^1,1*2.2 

(83) 

and we have /(Xi.i; l2,i|A2.2) < 1 — H2{^) with equality 
if Xi i is uniform and independent of A2.2- We further have 
/(A2, 2; ^1. 21-^^1,1) < 1 since Yi,2 is binary. This shows that 
Theorem [T] is tight. 

Finally, we translate the capacity-achieving strategy into a 
code tree distribution. We label the branch-pairs of our tree 
A2,2 as 5o6i by which we mean that ^2,2 = &o if ^2,1 = 
and ^2,2 — bi if 1:2, 1 — 1- We choose A2,2 independent of 
Xi,i and 

Pa.,. (00) = Pa... (11) =0 
Pa.,.(01) = Pa.,.(10)-1/2 

and compute /(A2, 2; 1^1.21^1. 1) = 1, as desired. 



D. Digital Network Coding 

The final channels we consider are relay networks with iBM. 
Suppose node 1 multicasts a message of rate R to sink nodes 
in the set T. The quantize-map-forward (QMF) and noisy 
network coding (NNC) strategies in ||29l . Il30l generalize to 
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NiBMs and we call the resulting strategies with code functions 
digital network coding (DNC). DNC achieves R satisfying 



LR < ^ min^ /(A^; yi^Ffcl A^.T) 

^I{Yi;Yi\Aji%hT) 



(84) 



for all 5 C /C with 1 e S and STi T ^ 9. The A^, k ^ 
1,2, . . . , K, are independent and Ijf is a noisy function of 
A^ and Y'/' for all k. 

Remark 27: A simple lower bound on the first mutual 
information expression in (|84]) is 

(85) 

We use the right-hand side of ( [85|l below because it better 
matches ^17) with Y^^ replacing ■ 

Example 16: We extend results of ll29l . 1301 . If the network 
is deterministic then A^ determines X^Y^. We thus have 



/(A|;yiirfe|A|.r) > /(A^;yi;|A^.T). 



I{Yi;Yi\A]tyi^T)^Q 



(86) 



and can choose Y^ 



Y^ to achieve the cut-set bound but 



evaluated with independent code functions only. As a result, 
we obtain the multicast capacity of networks of deterministic 
point-to-point channels with iBM, for instance. However, DNC 
will not give the capacity region for all deterministic networks 
because dependent code functions may increase rates. 



where Xj^ is Gaussian. We further choose 



1,2, 



(91) 



where is independent of X^Y^ and has the same statistics 
as Z^. Consider the right-hand side of ( (85] l with code words 
rather than code functions. We have 

:^;Yk\x^ 

(a) 



h{Gs^sXs + Zs-= +Zsa)- h{Z^. + Z^.) 



— - log 

(c) 1 

> - log 

- 2 6 






2 



(92) 



where (a) is because the Xj^ are independent, (b) is because 
the Xj^ are Gaussian, and (c) follows by using |A + B/2| > 
|(A + B)/2| = |A + B|/2'' for 6 X 6 positive definite matrices 
A and B. We also have 



I{Yi;Yi\X^%h 



= \S\L/2 



Zk\X^Zga) 



+ z^) 



(93) 



E. DNC for Gaussian Networks 

Consider the channel ( [TS] ) with additive Gaussian noise 
(AGN), i.e., the Z^, are Gaussian noise vectors and where Z^t- 
has a positive definite covariance matrix. For simplicity, we 
assume that the , , . . . , Z^ are mutually independent. 

Suppose again that node 1 multicasts a message of rate R 
to sink nodes in T- Let 5 be a cut, i.e., 1 £ S and S'^OT 0. 
We use the notation 



Y 



.5c 



= G 



S-'S 



X. 



G 



X. 



(87) 



for the equations (T5\ with k e 5"^, where Gui> is a 
\hl\L X \V\L matrix with block-entries Gkj, k £U,j £ V. 
We begin with the upper bound (|25]l which we write as 



(88) 



HGs^sK^ + Zs4K^.) - h{Zs. 



< HGs^sKs 

(«) 1 , |Qz, 

< 2log^ 



Z^.)-hiZs.) 

~1%J 



where (a) follows by the maximum entropy theorem. The 
(positive definite) noise covariance matrix has a Cholesky 



decomposition Qz 



where ^ is lower 



triangular and invertible. We can thus rewrite mm as 



Ysh\\X^.)<hog 



Gs-s Ggcg 



(89) 



where lu is the \U\L x \U\L identity matrix and G^c^ 



Sz' G 



I^or achievability, we choose T to be a constant and the code 
functions (effectively) as code words 

Aii-) = X^, k^l,2,...,K (90) 



where the last step is because Zg has the same statistics as 
Zh. Combining (|92]l and (|93]l we find that R satisfying 



LR<\ log 



I5C + G^e^ G5C5 



\K\L 



(94) 



for all 5 C /C with 1 e 5 and 5^= n T 7^ are achievable. 

It remains to study the first expression on the right-hand 
side of ( |94l ), both without and with independent Xj^ . Suppose 
that Gs'S h^s the singular value decomposition TJ-'^SV so 
that this expression is 

1 



log Isc+SVQx V^S^ 



(95) 



Suppose there are K power constraints X]"=i I^I^fe J/" — 
k ^ 1,2, . . . , K (see ([T9|). Optimizing over Qx^ we obtain 
min(|5|, • L parallel channels on which we can put at 
most power \S\P. We thus have the capacity upper bound 



LR<J2llog{l + .s^^\S\P) 



(96) 



where the sum is over the parallel channels and the sj are the 
singular values. 

For a lower bound we simplify (|90]l even further and choose 
Qjsf^ = [P/L) ■ I{fc}. The expression ( |95] l becomes 



1 



]{PIL)) 



> 



Ehosil + s^^\S\P) 



\S\L 



log(|5|L). (97) 



We thus have the following extension of results in 
that is interesting for high signal-to-noise ratios. 



12 



Theorem 5: DNC for scalar, linear, AGN channels, sym- 
metric power constraints, and a multicast session achieves 
capacity to within 



|/C|(l + log(|/C|i))/2 bits. 



(98) 



One may derive better results than ( [98) by using the approach 
in ll30l . for example. 

Appendix A 
Proof of Cut-Set Bound 

The bound follows from classic steps and the factorizations 
([T3]l and ([14]). There is one new subtlety, however, namely 
how to define the random code functions that appear in ( fTT) !. 
Fano's inequality states that for Pg — ^ we have 

nRM(s) < I{WMis);{W^M(^s) ■■ ^ e S^}) 



/(M/£(5)AS;ri'c|M/£(5c)AS.) 
/(AS;r5".|AS.) 



(99) 



where (a) follows because VP^vij^) is a subset of W£(5) and 
because {wj^^^) : £ e S''} is a function of and W£(^say, 
(b) follows because the messages are independent and A[! is 
a function of the messages at node k; and (c) follows because 



W> 



— A^ — Yg, forms a Markov chain for any S and S'. 
Recall that n = mL for some integer m. We may thus write 



£{S) 



(^-l)L^ 



< 



i=l 



-1)L^ 



(100) 



where (a) follows by choosing Y^^^ to be the channel output 
of node k from time (i — 1)L + 1 to time iL, and where (6) 
follows by Markovity. 

We now define A^ ^ to be the sub-function of A]!" of depth 

L that corresponds to the channel output Yj, 
have 



(j-l)L 



We then 



/(AV^r< 



L^(i-l)L. 



,Ys^JAliiY. 



(a) 



< iJ(y5^e,,;|A^.^,) - iJ(rii,jAfe,,A^^y^ 
':^^i/(rii,,|A^._j-i7(riijA^,,) 

L . vL I A L 



'iLT^(i-l)i 



(101) 



where {a) follows because A^^ is a function of A^^F^*-' ^'''^ 

- Yi.^, forms 



and (6) follows because Aji^Y'^'"^''^ 
a Markov chain (this step is crucial because it permits L 
letterization). 



The remaining steps follow in the usual way because the 
A£-to-Y^ channel does not depend on the block index i. 
More precisely, we have 



K 



.fc=i 



(102) 



where Py'-\a'^ refers to the first L channel uses. Inserting 



into ( |10b| i, we have 

m 

I{Al;YSAA-s.) <Y.^^^sX.yk^A^S^,i) 



mI{Akrr\Ykr,^\A^ 



(a) 



■5<=,T 



T) 



< mI[Agrp]Y<^c rp\Agc f) 



(103) 



Yj^rp forms 



where T takes on the value i, i — 1,2, . . . ,m, with probability 
1 /m, and where (a) follows because T — A£ j, - 
a Markov chain. Inserting ( |103| l into ( |99] l, we have 

i • RMis) < HAIt\ ^s^^^tIA^.t) (104) 
where the joint distribution of the random variables factors as 



(105) 



and where the second term in ( |105[ ) is computed using ( |102[ ) 
(this fact is crucial because it permits the factorization (fTSll). 

Remark 28: Consider the case n ^ mL for which we may 
as well consider n — mL + L' where < L' < L. The sum 
in ( |103| i will then be increased by a term of the form 



(106) 



where the code functions have depth L'. The term ( |106[ ) could 
be larger than the right-hand side of ( |104| i. However, if ( |106[ ) is 
bounded and m is large then the capacity is effectively limited 
by ( fT04l ). 

Remark 29: Consider the jth cost constraint in ([19]). We 
may rewrite ^T9\ as 



L 



•=i i=\ 



)] 



/C,(T-l)L+£i ^K;,(T-l)L+f)] < S'j (107) 

and the inequality in ( [1Q7[ ) is the jth inequality in ( [20] l. 
Appendix B 

Cardinality Bounds For Point-to-Point Channels 
Consider a point-to-point channel with NiBM. We write 

i7(y^|A^) = ^P(a^)i7(y^|A^ = a^) (109) 

where P(?/^|a^) and i/(F^|A-^ = a-^) are determined by 
the channel P(y^||a:^). Equations ( |108| ) and ( [109| l imply that 
P{y^) and _ff(y-^|A^) are convex combinations of P(a^). 
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Furthermore, if we fix Pyt(-) and iJ(y^|A^) then we have 
fixed I{A^;Y^). We can therefore focus on |3^^| constraints 
and 1 14, Lemma 3.4] guarantees that we need only non- 
zero values of P(a^). 
Similarly, observe that 

Piv"-)^ E ^(^'^llOy^-^)P(y^/||a:^) (110) 



so that if we fix P(a;^||0?/^^^) then we have fixed P{y^). Our 
approach will be to replace |3^^| — 1 constraints of the form 
( |108| ) with (hopefully fewer) constraints to fix P(a;^||Oy^^^). 

We proceed by induction. We may fix P{xi) with \Xi \ — 1 
constraints of the form 



P(xi) = 5]P(a^)P(xi|a^) 



(111) 



since P(xi|a-^) is a fixed function. This fixes P{xi,yi) 
because the channel specifies P(yi|a;i). Now suppose that 



P{x^ ^,y^ ^) is fixed and write 



pix,\x'-\r 



^, P(x%y'-i|a^) 
P(a;'-i,y»-i) 



(112) 



where P(x%y* is fixed because a^ is in the condition- 

ing. We must thus define 



m-i) 



constraints of the form ( |1 12| i to fix P{xi\x^ ^, 



(113) 



for all its 



arguments. This in turn fixes P{xi, yi\x^~^, because the 
channel specifies P{yi\x\y^~^). We thus find that P(a;*,y*) 
is fixed which completes the induction step. Collecting all the 
constraints including ( |109| l we have 



- 1) 



(114) 



constraints in total. This number may be less than |3^^|, e.g., 
if one of the L channel outputs is continuous. 

Appendix C 

Cardinality Bounds For Relays Without Delay 

Consider an RC without delay. Suppose P(a;|ai) is fixed 
which fixes P{x,xi,y,yi\aLi) because the channel fixes 
P{yi\x) and P{y\x,xi,yi), and ai fixes P{xi\a.i,yi) due 
to ([T]l. We have 

P{y)=Y.P{a,)Piy\a,) (115) 

ai 

H{Y\XAi)=J2Piai)H{Y\X,Ai=ELi) (116) 

ai 

/(A;yyi|Ai) = E^(^i)-^(^;^^i|Ai (117) 

ai 

where P(y|ai), H{Y\X,Ai = ai), and I{X;YYi\Ai = 
ai) are all fixed quantities. Finally, if we fix Py( ) and 
H{Y\X Ai) then we have fixed I{XAi;Y). We thus have 
|3^| + 1 constraints in total. 



Next, we note that 

Piy)^ J2 P{x,xi)P{yi\x)P{y\x,xuyi) (118) 



x,xi,yi 



SO that if we fix P{x. xi) then we have fixed P{y). We proceed 
by writing 

P(x,xi) = E^(ai)^'(2:,xi|ai) (119) 



which gives us jA"! • I^Yil — 1 constraints instead of the |3^| — 1 
before. Together with and we aiTive at | A" | • | ^"1 1 + 1 
constraints in total. 

Appendix D 

Converse for a Class of MACs with Feedback 

Let = for i = 1,2, ...,m. Fano's 

inequality and the independence of the messages gives 

= /(A?;y"|A^) 

= ^H{Yl'\Aij^Y^'-^^^) - i7(y^^|Af A^^r(^-i)^) 

i=l 
m 

E HiY^'^lA^^V,) - H{Y^^\A\^AfV,) 



< mI{A^rj.-Y^\A^TVTT) 

< mI{A{rj.-Y^\A^rj.VT) 



(120) 



where (a) follows because A\Y'^^'^ defines X2 and therefore 
also X}^^. Step [h) follows by using T as our time-sharing 
random variable, A^ ^ as in Appendix jXj and similar steps as 
in ( |101| i; step (c) follows because 

T — Vt^A^/jiA^jh — 

forms a Markov chain. The chains 

T — A ^ A ^ — 

A^rp T 

are also Markov. 

By symmetry, we have a similar bound for nR2. The 
corresponding sum-rate bound is 

n{Ri+R2) <IiWiW2;Y'') 
= I{A'IA!^;Y") 



<J2HiYh-HiY,'^\A\'^Al'^V.) 

i=l 

^ mI{Af^TA^y,Y^\T) 

< m/(Af j,A^y;K^). (121) 

Collecting the bounds, we arrive at the region of Theorem |4] 
The cardinality bound follows by using similar steps as in 
Appendices [B] and [C] see also ISTl App. B]. 
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